Online Clustering of Data Streams
نویسنده
چکیده
We consider the problem of clustering data streams. A data stream can roughly be thought of as a transient, continuously increasing sequence of time-stamped data. In order to maintain an up-to-date clustering structure, it is necessary to analyze the incoming data in an online manner, tolerating but a constant time delay. For this purpose, we develop an efficient online version of the classical K-means method. Our algorithm’s efficiency is mainly due to a (discrete) Fourier transform of the original data, resulting both in a smoothing as well as a compression of these data.
منابع مشابه
Probability Density Grid-based Online Clustering for Uncertain Data Streams
Most existing stream clustering algorithms adopt the online component and offline component. The disadvantage of two-phase algorithms is that they can not generate the final clusters online and the accurate clustering results need to be got through the offline analysis. Furthermore, the clustering algorithms for uncertain data streams are incompetent to find clusters of arbitrary shapes accordi...
متن کاملOnline clustering of parallel data streams
In recent years, the management and processing of so-called data streams has become a topic of active research in several fields of computer science such as, e.g., distributed systems, database systems, and data mining. A data stream can roughly be thought of as a transient, continuously increasing sequence of time-stamped data. In this paper, we consider the problem of clustering parallel stre...
متن کاملOn clustering large number of data streams
Data streams and their applications appear in several fields such as physics, finance, medicine, environmental science, etc. As sensor technology improves, sensor data rates continue to increase. Consequently, analyzing data streams becomes ever more challenging. Fast online response is a must for applications that involve multiple data streams, especially when the number of data streams is lar...
متن کاملOnline-Data-Mining auf Datenströmen: Methoden zur Clusteranalyse und Klassifikation
• J. Beringer and E. Hüllermeier. Efficient instance based learning on data streams. Adaptive optimization of the number of clusters in fuzzy clustering. Fuzzy clustering of parallel data streams. Adaptive optimization of the number of clusters in fuzzy clustering.
متن کاملBenchmarking Stream Clustering Algorithms within the MOA Framework
In today’s applications, massive, evolving data streams are ubiquitous. To gain useful information from this data, real time clustering analysis for streams is needed. A multitude of stream clustering algorithms were introduced. However, assessing the effectiveness of such an algorithm is challenging, because up to now there is no tool that allows a direct comparison of these algorithms. We pre...
متن کاملDynamic Clustering Of High Speed Data Streams
We consider the problem of clustering data streams. A data stream can roughly be thought of as a transient, continuously increasing sequence of time-stamped data. In order to maintain an up-to-date clustering structure, it is necessary to analyze the incoming data in an online manner, tolerating but a constant time delay. The purpose of this study is to analyze the working of popular algorithms...
متن کامل